Evaluation of a generalized dynamic cepstrum in distant speech recognition

نویسندگان

  • Hiroshi Matsumoto
  • Akihiko Shimizu
  • Kazumasa Yamamoto
چکیده

This paper examines the effectiveness of a generalized dynamic cepstrum in distant speech recognition. The generalized dynamic cepstrum (DyMFGC) is based upon the forward masking on the generalized logarithmic spectrum instead of the log-spectrum, which intends to make it robust to additive noise as well as convolutional noise. Digit recognition tests were carried out in a relatively quiet and small sized office environment. Under white noise environments, the DyMFGC outperforms the dynamic cepstrum on the logarithmic spectrum and the MFCC with cepstral mean normalization. It also maintains the word accuracy of 90% to 95% within a 1m distance from a source. In speech babble noise environments, the performance of the DyMFGC is approximately the same as that of the dynamic cepstrum on the logarithmic amplitude scale.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Robot Arm Performing Writing through Speech Recognition Using Dynamic Time Warping Algorithm

This paper aims to develop a writing robot by recognizing the speech signal from the user. The robot arm constructed mainly for the disabled people who can’t perform writing on their own. Here, dynamic time warping (DTW) algorithm is used to recognize the speech signal from the user. The action performed by the robot arm in the environment is done by reducing the redundancy which frequently fac...

متن کامل

Text Independent Speaker Identification with Finite Multivariate Generalized Gaussian Mixture Model with Distant Microphone Speech

An effective and efficient speaker Identification (SI) system requires a robust feature extraction module followed by a speaker modeling scheme for generalized representation of these features. In recent, years Speaker Identification has seen significant advancement, but improvements have tended to be bench marked on the near field speech, ignoring the more realistic setting of far field instru...

متن کامل

Speaker Recognition Model Based on Generalized Gamma Distribution Using Compound Transformed Dynamic Feature Vector

In this paper, we present an efficient speaker identification system based on generalized gamma distribution. This system comprises of three basic operations, namely speech features classification and metrics for evaluation. The features extracted using MFCC are passed to shifted delta cepstral coefficients (SDC) and then applied to linear predictive coefficients (LPC) to have effective recogni...

متن کامل

Some advances on speech analysis using generalized dimensions

Nonlinear systems based on chaos theory can model various aspects of the nonlinear dynamic phenomena occuring during speech production. In this paper, we explore modern methods and algorithms from chaotic systems theory for modeling speech signals in a multidimensional phase space and extracting characteristic invariant measures such as the generalized fractal dimensions. Such measures can capt...

متن کامل

Deep neural network-based bottleneck feature and denoising autoencoder-based dereverberation for distant-talking speaker identification

Deep neural network (DNN)-based approaches have been shown to be effective in many automatic speech recognition systems. However, few works have focused on DNNs for distant-talking speaker recognition. In this study, a bottleneck feature derived from a DNN and a cepstral domain denoising autoencoder (DAE)-based dereverberation are presented for distant-talking speaker identification, and a comb...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001